Mandarin Topic-oriented Conversations
نویسنده
چکیده
This paper describes the collection and processing of a pilot speech corpus annotated in dialogue acts. The Mandarin Topic-oriented Conversational Corpus (MTCC) consists of annotated transcripts and sound files of conversations between two familiar persons. Particular features of spoken Mandarin, such as discourse particles and paralinguistic sounds, are taken into account in the orthographical transcription. In addition, the dialogue structure is annotated using an annotation scheme developed for topic-specific conversations. Using the annotated materials, we present the results of a preliminary analysis of dialogue structure and dialogue acts. Related transcription tools and web query applications are also introduced in this paper.
منابع مشابه
HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus
The paper describes the design, collection, transcription and analysis of 200 hours of HKUST Mandarin Telephone Speech Corpus (HKUST/MTS) from over 2100 Mandarin speakers in mainland China under the DARPA EARS framework. The corpus includes speech data, transcriptions and speaker demographic information. The speech data include 1206 ten-minute natural Mandarin conversations between either stran...
متن کاملAbsolute and relative entrainment in Mandarin conversations
Based on Tongji Games Corpus, this study analyzes acoustic-prosodic entrainment in Mandarin conversations. Analyses have been accomplished at the levels of conversation, turn, and tone unit. Absolute entrainment in prosody is found at the levels of conversation and turn, and relative entrainment is found over tones. Therefore, this study identifies evidence for the existence of two kinds of ent...
متن کاملImproving Language Models for Mandarin Conversational Speech Recognition with Web Data
Lack of data is a problem in training language models for conversational speech recognition, particularly for languages other than English. Experiments in English have successfully used webbased text collection targeted for a conversational style to augment small sets of transcribed speech; here we look at extending these techniques to Mandarin. In addition, we investigate different techniques ...
متن کاملA word- and turn-oriented approach to exploring the structure of Mandarin dialogues
This paper investigates the structure of Mandarin spoken dialogues by analysing the distribution of words and turns used in dialogues. The results of an empirical quantitative study show that independent of speakers, there exists a kind of basic vocabulary for daily Mandarin conversations. It is proposed that this is the minimal set of a lexicon for the use of spoken Mandarin. Moreover, a numbe...
متن کاملHarmony and Tension in Mandarin Chinese Prosody: Constraints and Opportunities of Lexical Tones in Discourse Markers
Prosody in tonal languages such as Mandarin provides a fascinating test on the universal character and attributes of prosody in natural language. Prosodic variation is a key element in marking intention, cognitive states, and topic development, and answers to how these communicative goals are accomplished in Mandarin provide enlightening discoveries on the universality of prosodic forces and sh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IJCLCLP
دوره 10 شماره
صفحات -
تاریخ انتشار 2005